College Board Report No. 98-2 
ETS RR No. 98-9 


Inquiring About 
Examinees' Ethnicity 
and Sex: Effects on 
Computerized Placement 
Tests" Performance 


LAWRENCE j. STRICKER 
and WILLIAM C. WARD 


College Entrance Examination Board, New York, 1998 



Lawrence J. Strieker is a Principal Research Scientist 
and William C. Ward is a Senior Development Leader 
at ETS. 


Researchers are encouraged to freely express their pro- 
fessional judgment. Therefore, points of view or opin- 
ions stated in College Board Reports do not necessarily 
represent official College Board position or policy. 


Founded in 1900, the College Board is a not-for-profit 
educational association that supports academic 
preparation and transition to higher education for 
students around the world through the ongoing collab- 
oration of its member schools, colleges, universities, 
educational systems and organizations. 

In all of its activities, the Board promotes equity 
through universal access to high standards of teaching 
and learning and sufficient financial resources so that 
every student has the opportunity to succeed in college 
and work. 

The College Board champions— by means of superior 
research; curricular development; assessment; guidance, 
placement, and admission information; professional 
development; forums; policy analysis; and public 
outreach — educational excellence for all students. 


Additional copies of this report may be obtained from 
College Board Publications, Box 886, N ew Y ork, N ew 
Y ork 10101-0886. The price is $15. Please include $4 
for postage and handling. 


Copyright © 1998 by College Entrance Examination 
Board and Educational Testing Service. Ail rights 
reserved. College Board, Advanced Placement 
Program, AP, SAT, and the acorn logo are registered 
trademarks of the College Entrance Examination 
Board. ACCUPLACER, Computerized Placement 
T ests, and CPTs are trademarks owned by the 
College Entrance Examination Board. 


Acknowledgments 

Thanks are due to Central Piedmont Community Col- 
lege for cooperating in the study, David A. Rhoden for 
coordinating the data collection, M argaret L. Redman 
for preparing the data for analysis, Laura M . Jenkins 
and Xuefei Hui for doing the computer analysis, and 
Rick M organ and Gita Z. Wilder for reviewing a draft 
of this report. 


Printed in the United States of America. 



C ontents 


Abstract l 

I ntroduction l 

M ethod 2 

Sample 2 

Procedure 2 

M easures 3 

Analysis 3 

Results and D iscussion 4 

Intercorrelations 4 

Analyses of Covariance 4 

Conclusions 4 


Tables 

1. Summary of Sample Characteristics 2 

2. Intercorrelations of Scores and Times on T ests.,.4 

3. Summary of Analyses of Covariance 

of Scores and Times on Tests 5 

4. M ean Scores and Times on T ests 

for Ethnic Groups and M en and Women 5 


References 7 

Appendix 8 



This page is intentionally blank. Please continue on to the next page. 



A bstract 

Laboratory experiments by Steele and Aronson (1995) 
found that African-American subjects' performance on 
difficult verbal items, described as a verbal problem- 
solving task, was adversely affected when they were 
asked about their ethnicity just before working on the 
items. These results were attributed to stereotype threat: 
asking about ethnicity primes African-American sub- 
jects' concerns about fulfilling the negative ethnic 
stereotype about their intellectual ability, thereby dis- 
rupting test performance. The present field experiment 
assessed the effects of asking community college stu- 
dents taking the Computerized Placement Tests™ 
(CPTs™), in an actual operational setting, about their 
ethnicity and sex. This inquiry had no statistically and 
practically significant effects on how weii the examinees 
did on the tests or how long they worked on the tests. 


Introduction 

Two experiments by Steele and Aronson (1995), done 
with Stanford undergraduates, found that African- 
American subjects' performance on difficult GRE Gen- 
eral T est (Briel, O'N eill, and Scheuneman, 1993) verbal 
items was adversely affected when they were asked 
about their ethnicity just before they began working on 
the items, while white subjects' performance was unaf- 
fected. The African-American subjects who were asked 
about their ethnicity answered fewer items correctly, an- 
swered correctly a smaller percentage of attempted 
items, attempted fewer items, and spent more time 
working on the items. The purpose of the experiments 
was described to the subjects as "nondiagnostic" — to 
understand the psychological factors involved in solving 
verbal problems; individuals' ability was not being eval- 
uated, though they could receive feedback about their 
performance. Steele and Aronson explain their results as 
coming about because asking about ethnicity primes 
African-American subjects' concerns about fulfilling the 
negative ethnic stereotype about their intellectual ability, 
thereby disrupting the subjects' test performance. (See 
also Steele, 1997.) Other research by Spencer, Steele, and 
Quinn (1997) suggests that thissamekind of "stereotype 
threat" affects the performance of women on quantita- 
tive test items, given the negative stereotype about wom- 
en's ability in this sphere. 

The Steele and Aronson studies have obvious paral- 
lels with the test administration procedures for the 
Computerized Placement Tests (College Board, 1995; 
Ward, 1988) and Advanced Placement (AP®) Examina- 


tions (College Board and Educational Testing Service, 
1995), as well as for other standardized tests that re- 
quire examinees to answer questions about their eth- 
nicity and sex just before they take the tests. But the 
Steele and Aronson studies and the CPTs and AP pro- 
cedures may also differ in important respects. First, the 
Steele and Aronson subjects were taking the items for 
research purposes whereas CPTs and AP examinees 
take the tests for important personal reasons— to guide 
their course placement or to get advanced credit in col- 
lege— and hence may be more motivated to do well on 
the test material. Second, the experimental task in the 
Steele and Aronson studies was portrayed as innocuous 
problem solving whereas CPTs and AP examinees are 
aware that they are taking tests that reflect their mastery 
of important academic skills or specific course content. 
Steele and Aronson have also found that stereotype 
threat is heightened when the experimental task is de- 
scribed as diagnostic of the subjects' intellectual ability. 
Thus, inquiring about ethnicity and sex may havea lim- 
ited impact on CPTs and AP Examinations insofar as 
stereotype threat is already elevated by examinees' per- 
ceptions of these tests as diagnostic. 1 Third, Steele and 
Aronson theorize that stereotype threat only affects ex- 
aminees who identify with the subject matter being 
tested. Although AP examinees may be very involved 
with the academic skills being tested, CPTs examinees 
are probably less involved. Fourth, research by Spencer 
et al. suggests that an important element in the opera- 
tion of stereotype threat is subjects' perceptions of the 
items as difficult, at the limits of their ability; it is un- 
clear whether CPTs and AP examinees perceive the tests 
in this way. 

A recent study (Strieker, 1998) evaluated applica- 
bility of the Steele and Aronson results to the AP Cal- 
culus AB examination (College Board, 1994), and to 
girls as well as African-American examinees. This test 
was chosen for investigation because it is taken by rela- 
tively large numbers of African-American examinees as 
well as girls; substantial differences exist in the test per- 
formance of African Americans and whites and of girls 
and boys; and the subject matter of the test is pertinent 
to the stereotype about females' quantitative ability as 
well as to the stereotype about African-Americans' 
ability in general. The test administration was modified 
for a random sample of schools by masking demo- 
graphic questions on the standard answer sheet and 
distributing the standard answer sheet after the test to 

4n an unpublished pilotstudy by Steele, inquiring about ethnicity 
did not affect the performance of African-American subjects 
when they were told that the experimental task was diagnostic 
(C. M . Steele, personal communication, M ay 21, 1997). 



obtain answers to these questions. Comparisons of the 
examinees in these classes with examinees in a random 
sample of classes that received the standard answer sheet 
generally found no differences for African-American, 
female, or other subgroups of examinees on the kinds of 
measures of test performance used by Steele and 
Aronson. (Time measures could not be obtained for this 
group-administered, conventional paper-and-pencil test.) 

Differences between the AP Calculus AB Examina- 
tion and CPTs in their content and purpose, as well as 
in the respective test-taking populations, make the gen- 
eralizability of the AP results to the CPTs uncertain. 
H ence the aim of the present study was to replicate the 
Strieker investigation of theAP Examination, assessing 
the effects of asking about CPTs examinees' ethnicity 
and sex on their scores on the tests and the time that 
they spent on the tests. 


M ethod 

Sample 

The sample consisted of all incoming students at Central 
Piedmont Community College, Charlotte, North Car- 
olina, who took the CPTs for the first time during a four- 
week period from August 12 to September 7, 1996. The 
total sample was 1,341: 333 white men, 249 African- 
American men, 65 other men, 391 white women, 219 
African-American women, and 84 other women. The ex- 
perimental group consisted of 632 subjects who took the 
CPTs during the two middle weeks of August 20 and 
August 26; the control group consisted of 709 subjects 
who took the tests the first week, that of August 12, or the 
last week, that of September 3. (0 ne examinee in the ex- 
perimental group whose ethnicity could not be ascertained 
and seven subjects in the control group who took the 
CPTs with the test administration procedures for the ex- 
perimental group were excluded from the sample.) The 
size of the experimental and control groups for each CPTs 
varies because examinees did not necessarily takeall of the 
CPTs. The sample size for each CPTs was 1,176 for 
Elementary Algebra, 1,238 for Arithmetic, 1,144 for 
Reading Comprehension, and 1,073 for Sentence Skills. 

The characteristics of the total sample are summarized 
in Table 1. The experimental and control groups were 
comparable in ethnicity (52.0 percent and 55.7 percent 
white, and 38.1 percent and 32.0 percent African Amer- 

2 Other ethnic groups were pooled in the study because of their 
small number. 


Table 1 

Summary of Sample C haracteristics 




G roup 



Variable 

Experimental 
(N =632) 

Control 
(N =709) 



N 

% 

N 

% 

Ethnicity 

White 

329 

52.0 

395 

55.7 

African American 

241 

38.1 

227 

32.0 

Other 

62 

9.8 

87 

12.3 

Sex 

M ale 

296 

46.8 

351 

49.5 

Female 

336 

53.2 

358 

50.5 

Age 

19 or under 

288 

45.6 

365 

51.5 

20 to 24 

166 

26.3 

177 

25.0 

25 to 29 

65 

10.3 

76 

10.7 

30 to 34 

46 

7.3 

29 

4.1 

35 or more 

65 

10.3 

58 

8.2 

N ot ascertained 

2 

.3 

4 

.6 

Intended Program 

Associate's degree 

176 

27.8 

230 

32.4 

in arts and science 

Associate's degree 

217 

34.3 

255 

36.0 

in a vocational field 

Undecided 

33 

5.2 

24 

3.4 

N ot ascertained 

54 

8.5 

11.21 

20.19 

CPTs Test 

Elementary Algebra 

561 

88.8 

615 

86.7 

Arithmetic 

582 

92.1 

656 

92.5 

Reading 

487 

77.1 

557 

78.6 

Comprehension 

Sentence Skills 

488 

77.2 

585 

82.5 


N ote: Percentages may not add up to 100.0 because of rounding 
error. 


ican), 2 sex (46.8 percent and 49.5 percent men), age (45.6 
percent and 51.5 percent 19-years-old or under), and in- 
tended program (27.8 percent and 32.4 percent associ- 
ate's degree in arts and science, and 34.3 percent and 
36.0 percent associate's degree in a vocational field). 
Over 85 percent took the CPTs quantitative tests (88.8 
percent and 86.7 percent for Elementary Algebra; 92.1 
percent and 92.5 percent for Arithmetic), and 75 percent 
to 85 percent took the CPTs verbal tests (77.1 percent 
and 78.6 percent for Reading Comprehension; 77.2 per- 
cent and 82.5 percent for Sentence Skills). 

Procedure 

Students routinely scheduled to take the CPTs at the 
college's testing center, before beginning their course 



work in the Fail 1996 semester, were directed to the 16 
personal computers regularly used in administering the 
CPTs. For the experimental group (examinees tested in 
the weeks of August 20 and August 26), the initial com- 
puter screens containing the demographic questions 
were eliminated on all computers, and a paper-and- 
pencil questionnaire with these questions was adminis- 
tered after the CPTs were completed. (A copy of the 
questionnaire appears in the Appendix.) No other 
changes were made in the test administration. For the 
control group (examinees tested in the weeks of August 
12 and September 3), all the regular test administration 
procedures were followed, including the presentation 
on all computers of the initial computer screens with the 
demographic questions. 

M easures 

CPTs 

The CPTs consist of four tests: Elementary Algebra, 
Arithmetic, Reading Comprehension, and Sentence 
Skills. The CPTs are computer adaptive tests, and the 
same number of items, 12 to 20, depending on the test, 
are administered to all examinees. Examinees are re- 
quired to attempt every item presented to them, and there 
is no penalty for guessing. The DOS 4.5 version of the 
CPTs was used. Two scores were obtained for each test: 

1. The regular T otal Right Score. This score is an esti- 
mate of thenumber of items that theexaminee would 
answer correctly in the original pool of 120 items for 
each test. 

2. The total time (in seconds) spent on the items. (Time 
for Elementary Algebra and Arithmetic were un- 
available for one examinee.) 

Other measures of test performance in the Strieker 
(1997) study and the Steele and Aronson (1995) re- 
search based on the number of attempted, omitted, or 
not reached items could not be obtained because exam- 
inees must answer all items. 

Other variables 

Ethnicity, sex, and other background variables were ob- 
tained from the CPTs electronic records or the paper - 
and-pencil questionnaire. In cases where ethnicity and 
sex were not reported, this information was obtained 
from school records. 

Analysis 

The product-moment intercorrelations of the scores and 
times for the four tests were computed separately for 


the experimental and control groups, using a pair-wise 
missing data program. 

A series of 2 (Experimental versus Control) x 3 (Eth- 
nicity— White, African American, Other) x 2 (Sex) fac- 
torial analyses of covariance of the eight scores and 
times were carried out, using the least-squares method 
(M odel II error term; Overall and Spiegel, 1969) to deal 
with unequal Ns. Sixteen covariates were used. In cases 
where the data for a covariate were not reported or 
were unquantifiable (ranging from .4 percent for Age to 
21.7 percent for Father's Education), the mean or 
modal response for examinees of the same ethnic group 
and sex in the same experimental or control group was 
substituted. Q uestions with open-ended response alter- 
natives (e.g., "seven or more years") were dichotomized 
at the median of the distributions. Intended program: 
Diploma in vocational field was excluded to eliminate 
the dependency among the four Intended program 
dummy variables. The covariates follow: 

1. Age (in years) 

2. Father's education (high school graduate or less=0, 
some college or more=l) 

3. M other's education (high school graduate or 
less=0, some college or more=l) 

4. English is first language (yes=l, no =0) 

5. Disability (yes=l, no=0) 

6. Years of English in high school (three years or 
less=0, four years or more=l) 

7. Y ears of mathematics in high school (three years or 
less=0, four years or more=l) 

8. Studied algebra in high school (yes=l, no=0) 

9. Years since mathematics training (less than one 
year=0, one year or more=l) 

10. Intended program: Associate's degree in arts and 
science (this program =1, all other programs=0) 

11. Intended program: Associate's degree in vocational 
field (this program =1, all other programs=0) 

12. Intended program: Undecided (this program =1, all 
other programs=0) 

13. CPTs Elementary Algebra test taken (yes=l, no =0) 

14. CPT s Arithmetic test taken (yes=l, no =0) 

15. CPTs Reading Comprehension test taken (yes=l, 
no=0) 

16. CPTs Sentence Skills test taken (yes=l, no=0) 
Planned comparisons of simple main effects of the 

experimental versus control group factor for each ethnic 
group (e.g., African-American examinees in the experi- 
mental group versus African-American examinees in the 
control group) and each sex (e.g., women in the experi- 
mental group versus women in the control group) were 
also conducted (Howell, 1997). 

Note that the analyses of covariance (and compar- 
isons of simple main effects) use unweighted means. Ef- 



feet sizes were assessed by the correlation ratio (q). Both 
statistical and practical significance were considered in 
evaluating the results. An .05 significance level and an q 
of .10 (Cohen's, 1988, definition of a "small” effect 
size) were employed throughout (including the compar- 
isons of simple main effects; Keppel, 1982). 


Results and Discussion 

I ntercorrelations 

The intercorrelations of thescoresand timeson thefour 
tests for the experimental and control groups are re- 
ported in T able 2. The correlations were similar for the 
two groups. Scores on the two quantitative tests corre- 
lated highly (.70 and .71) as did scores on the two 
verbal tests (.73 and .78). Times on the two kinds of 
tests also correlated highly (.58 and .59 for quantitative 
tests, .75 and .76 for verbal tests). Apart from a sub- 
stantial correlation (.51 and .52) between score and 
time for Elementary Algebra, suggestive of speededness, 
the correlations between corresponding scores and 
times for the tests were modest. The correlations be- 
tween scores and times on different kinds of tests were 
also generally modest except for substantial correlations 
of time on Arithmetic with time on Reading Compre- 
hension (.51 and .59) and time on Sentence Skills (.53 
and .58). 

In short, apart from a few highly related scores or 
times, most of the variables were relatively independent 
of each other. The high correlations that were observed 
demonstrate that all of the scores and times had sub- 
stantial reliability. 


Analyses of Covariance 

The analyses of covariance of the scores and times on 
thetests, aswell astherelated planned comparisons, are 
summarized in Table 3; the corresponding means for 
the subgroups in the experimental and control groups 
appear in Table 4. 

Focusing on differences between the experimental 
and control groups for each ethnic group and sex, none 
of the eight two-way interactions of experimental versus 
control group with ethnicity, none of the eight two-way 
interactions of experimental versus control group with 
sex, and none of the eight three-way interactions of ex- 
perimental versus control group with ethnicity and sex 
were both statistically and practically significant. In ad- 
dition, none of the 24 simple main effects for ethnicity 
(white, African American, other) and none of the 16 
simple main effects for sex were significant. 

In brief, the test scores and times for an ethnic group 
or sex were unrelated to whether examinees were asked 
about their ethnicity or sex. 


Conclusions 

This study of the C PTs not only replicated the Strieker 
(1998) investigation of the AP Examination by failing to 
find a connection between inquiring about examinees' 
ethnicity and sex and how well they did on the tests but 
also extended the initial investigation by failing to find 
a connection with how long examinees worked on the 
tests. The convergence between the two studies, which 
differed in tests and test-taking populations, supports 
the generalizability of these negative outcomes and con- 
trasts with the Steele and Aronson (1995) findings of 


Table 2 

Intercorrelations of Scores and T imes on T ests 


Variable 


Variable 

(l) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

1. Elementary Algebra— Score 

- 

.51 

.71 

.05 

.24 

-.08 

.28 

-.11 

2. Elementary Algebra— Time 

.52 

- 

.48 

.59 

.19 

.34 

.20 

.30 

3. Arithmetic— Score 

.70 

.44 

— 

.20 

.48 

.01 

.47 

-.05 

4. Arithmetric— Time 

.12 

.58 

.18 

— 

.03 

.59 

.04 

.58 

5. Reading Comprehension— Score 

.37 

.28 

.57 

.12 

- 

-.07 

.73 

-.22 

6. Reading Comprehension— Time 

-.08 

.35 

-.12 

.51 

-.06 

- 

-.12 

.75 

7. Sentence Skills— Score 

.44 

.38 

.57 

.19 

.78 

-.05 

— 

-.22 

8. Sentence Skills— Time 

-.17 

.26 

-.21 

.53 

-.22 

.76 

-.13 

— 


Note: Correlations for the control group appear above the diagonal; correlations for the experimental group appear below it. N s vary from 499 
to 656 for the control group, and from 439 to 581 for the experimental group. For Ns of 656 and 581, correlations of .07 and .08 are signifi- 
cant at the .05 level (two-tail) respectively, and correlations of .10 are significant at the .01 level for both N s. 



Table 3 


Summary of Analyses of Covariance of Scores and Times on T ests 

F 


Elementary Algebra Arithmetic Reading Comprehension Sentence Skills 


Source 

df 

Score 

Time 

Score 

Time 

Score 

Time 

Score 

Time 

Experimental Control (E-C) 

1 

1.68 

1.20 

.12 

2.20 

.00 

4.10* 

.42 

3.84* 

Sex 

1 

7.41* 

.13 

11.94** 

.32 

.58 

.61 

.89 

2.13 

E-C x Sex 

1 

1.37 

.36 

.31 

.04 

.75 

1.22 

.87 

.90 

M ale 

1 

3.62 

.15 

.48 

.96 

.42 

.50 

.05 

.66 

Female 

1 

.01 

1.24 

.02 

1.25 

.33 

4.29 

1.03 

3.49 

Ethnicity 

2 

28.17** = 

.97 

73.09** = 

2.92* 

58.18** = 

9.06** = 

51.10** = 

22.63** = 

E-C x Ethnicity 

2 

1.08 

.23 

1.51 

.18 

.30 

1.19 

1.08 

.54 

White 

1 

.08 

2.08 

2.17 

2.08 

.15 

1.84 

.03 

1.25 

African American 

1 

.18 

.10 

.08 

2.95 

.41 

.02 

.77 

.52 

Other 

1 

2.15 

.29 

1.48 

.08 

.03 

3.75 

1.36 

2.31 

Ethnicity x Sex 

2 

2.37 

.46 

3.28* 

.71 

.32 

1.67 

1.18 

3.28* 

E-C x Ethnicity x Sex 

2 

.70 

.07 

.24 

.48 

1.62 

1.07 

4.14* 

3.56* 


N ote: The df for Error and M ean Square Error are 1150 and 542.99 for Elementary Algebra— Score, 1,150 and 132,605.00 for Elementary Al- 
gebra-Time, 1,211 and 614.13 for Arithmetic— Score, 1,210 and 188,970.50 for Arithmetic— Time, 1,017 and 386.82 for Reading Comprehen- 
sion-Score, 1,017 and 280,482.20 for Reading Comprehension— Time, 1,046 and 401.35 for Sentence Skills— Score, and 1,046 and 222,828.70 
for Sentence Skills— Time. *p <.05;**p <.01;=r| > .10 


Table 4 

M ean Scores and T imes on T ests for Ethnic G roups and M en and W omen 


Ethnicity Sex 




White 

African American 

Other 


Men 

Women 


Variable 

Exp 

Con 

Exp 

Con 

Exp 

Con 

Exp 

Con 

Exp 

Con 

S.D.= 

Elementary 

Algebra 

Score 

52.31 

51.80 

40.17 

41.15 

45.51 

51.86 

47.37 

51.69 

44.63 

44.86 

22.91 

Time 

676.95 

633.93 

630.13 

618.52 

684.82 

647.40 

660.73 

646.64 

667.21 

619.93 

364.15 

Arithmetic 

Score 

68.44 

65.51 

48.01 

47.32 

54.68 

60.23 

59.75 

61.41 

54.34 

53.96 

24.78 

Time 

1029.96 

979.46 

1103.87 

1031.60 

1083.82 

1061.80 

1059.93 

1018.47 

1085.18 

1030.10 

434.71 

Reading 

Comprehension 

Score 

1 

79.62 

80.29 

65.83 

64.47 

70.96 

71.60 

72.06 

73.44 

72.21 

70.79 

19.67 

Time 

1452.16 

1389.49 

1580.01 

1572.10 

1629.25 

1434.45 

1546.98 

1506.54 

1560.63 

14424.15 

529.61 

Sentence Skills 
Score 

86.37 

86.68 

74.00 

72.14 

72.11 

76.87 

77.48 

77.02 

77.51 

80.11 

20.03 

Time 

1086.11 

1040.62 

1301.87 

1265.73 

1247.53 

1101.29 

1222.05 

1182.74 

1201.62 

1089.03 

472.05 


Calculated from the M ean Square Errors in the analyses of covariance. 



differences in test performance produced by inquiring 
about ethnicity. 

The present study, like the previous one, differed 
from the Steele and Aronson research in some respects 
that may account for the divergent findings, as already 
mentioned and as discussed in detail by Strieker. The 
Steeieand Aronson research employed subjects in a lab- 
oratory study, whereas this investigation used exami- 
nees taking an operational test with real-life conse- 
quences. As a result, the CPTs examinees may be more 
motivated to do well on the tests, offsetting the adverse 
effects of stereotype threat. 

Relatedly, the Steele and Aronson subjects were led 
to believe that they were engaged in an innocuous 
problem solving task whereas the examinees in this 
study were aware that they were being tested for their 
academic skills. If examinees perceive the C PTs as diag- 
nostic of their cognitive resources, thereby generating 
stereotype threat, it is entirely conceivable that in- 
quiring about ethnicity and sex cannot further increase 
the stereotype threat. However, it is not at ail certain 
that stereotype threat is actually at its maximum on 
these tests. 

Several processes suggested by Strieker that might 
account for the difference between his AP study and the 
Steele and Aronson research are made less plausible by 
differences in the research designs of the Strieker study 
and the present one. It was argued that attributions for 
poor performance may not be the same for the AP 
Examination and the GRE verbal items used by Steele 
and Aronson because theAP Examination is linked to a 
particular course. Thus, examinees may attribute their 
poor performance on theAP Examination to an inade- 
quate course, not to their own characteristics or those of 
their ethnic group or sex, thereby blocking the effects of 
stereotype threat. But the CPTs are not tied to particular 
courses, making such attributions less likely, though 
examinees might still attribute their poor performance 
to substandard schooling in general. These remote attri- 
butions could also be made, though, by the subjects in 
the Steele and Aronson, and Spencer et al. (1997) re- 
search, for the test items that they used were similar in 
content to most of the CPT s (Steele and Aronson's GRE 
verbal items and CPTs Reading Comprehension, and 
Spencer et al.'sGRE quantitative and mathematics items 
and CPTs Elementary Algebra and CPTs Arithmetic). 
Nevertheless, the Steele and Aronson, and Spencer et al., 
test items were able to elicit stereotype threat. 

It was also suggested that stereotype threat in the 
Strieker study may have been vitiated by feedback 
during the course that either inoculated the examinees 
against stereotype threat or caused them to disidentify 
with the course material and thereby eliminated the ego 


involvement that stereotype threat requires to be effec- 
tive. But no such feedback exists for the CPTs. 

Finally, it was proposed that examinees may perceive 
quantitative tests, such as theAP Calculus Examination 
in the Strieker study, as more difficult than verbal tests, 
such as the verbal items in the Steele and Aronson re- 
search. Insofar as a test is seen as beyond the examinee's 
ability level, stereotype threat may not operate. But 
both verbal and quantitative tests were used in the pre- 
sent study, and the same results were obtained with 
both kinds of tests. 

Other differences exist between the present study and 
the Steele and Aronson research but are unlikely to ac- 
count for thedivergent results. First, thetwo-year college 
students are probably less ego involved in the academic 
skills assessed by thetestsand less able than the Stanford 
undergraduates in the Steele and Aronson research and 
theAP students in the Strieker investigation. This differ- 
ence is unlikely to be important, for the Steele and 
Aronson and the Strieker findings diverged despite being 
based on essentially the same population of students. 

Second, the sample was large (totaling 1,341 exami- 
nees, including 468 African-American and 694 female 
test takers), roughly comparable in size to the sample in 
the Strieker research (though the sample of African- 
American examinees was much greater in the present 
study), and substantially larger than the samples in the 
Steele and Aronson experiments (44 and 20 African- 
American subjects). Hence, the statistical power to de- 
tect mean differences was appreciable in this study. 

Third, the CPTs, because they are computer adap- 
tive, are geared to administer items at each examinee's 
ability level, with the result that heor sheshould beable 
to answer about 60 percent correctly (allowing for 
chance success). The Steele and Aronson research used 
a conventional testing approach, with all examinees 
being given the same items. The average item difficulty 
(mean percent correct) in the Steele and Aronson re- 
search was about 50 percent, comparable to the diffi- 
culty in the present study. 

Fourth, unlike the Steele and Aronson research, 
which adjusted for ability differences by covarying on 
the SAT ®’s Verbal score (Donlon, 1984), no direct con- 
trol for ability was employed. H owever, ability was in- 
directly controlled by covarying on amount of high 
school course work in English and mathematics, for 
such course work is substantially related to perfor- 
manceon ability tests (e.g., College Board, 1997; Laing, 
Engen, and M axey, 1987; M organ, 1989). Using course 
work as a covariate sidesteps the interpretive complexi- 
ties inherent in employing as a covariate performance 
on a test that may also be susceptible to stereotype 
threat. Although control for ability is unneeded to eval- 



uate differences between the experimental and control 
conditions within an ethnic group or sex, it is useful in 
comparing the interaction between ethnic group or sex 
and experimental versus control group in the Steele and 
Aronson research and the present study. 

Fifth, in contrast to the Steeie and Aronson research 
(and the Strieker study), examinees were not randomly 
assigned to the experimental and control groups. How- 
ever, randomization was approximated by assigning ex- 
aminees tested at different time periods to groups, and 
a large number of co variates were used to adjust for any 
secular trends in the nature of examinees that might 
exist. It is still possible, though improbable, that unad- 
justed but relevant differences are present between the 
examinees in the experimental and control groups. 

Finally, a point of similarity between this study and 
the Steele and Aronson research — examinees were in- 
dividually tested (in this study and in one of the Steele 
and Aronson experiments, the testing was done by 
computer)— lends support to the argument that any 
depersonalization associated with the group test ad- 
ministration in the Strieker study was unlikely to pro- 
duce the different results in that study and in the Steele 
and Aronson research. 

The present study reinforces the findings of the 
Strieker research and rules out some, but not all, of the 
alternative explanations for the differences between 
these investigations and the seminal experiments by 
Steele and Aronson. It is becoming increasingly clear 
that simply asking about ethnicity and sex is unlikely to 
degrade the performance of examinees who take stan- 
dardized tests in real -life settings. However, the broader 
consequences of stereotype threat for the functioning of 
these tests remains a matter of speculation that needs to 
be documented in field research. 
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Appendix 


BACKGROUND INFORMATION 


LAST NAME 


FIRST NAME 


MIDDLE INITIAL 


STUDENT IDENTIFICATION NUMBER 
(Social Security Number) 

DATE OF BIRTH 

Month, followed by the day, followed by the year 
Examples: 10-23-70 for October 23, 1970 
01-01-70 for January 1, 1970 

TODAY’S DATE 


The following questions ask for information that will be useful in research and evaluation of the test. 
Your responses to these questions are voluntary. If you choose not to answer a question, select Omit as 
your response. 

1. What is the total number of years you studied English in high school (grades 9-12)? Countless 

than a full year of English as a full year, but do not count a repeated year of the same course as 
an additional year of study. 

One year or the equivalent 

Two years or the equivalent 

Three years or the equivalent 

Four years or the equivalent 

More than four years or the equivalent 

I did not take any courses in English 

Omit 


2. What is the total number of years you studied mathematics in high school (grades 9-12)? Count 
less than a full year of mathematics as a full year, but do not count a repeated year of the same 
course as an additional year of study. 

One year or the equivalent 

Two years or the equivalent 

Three years or the equivalent 

Four years or the equivalent 

More than four years or the equivalent 

I did not take any courses in mathematics 

Omit 


Did you study algebra for at least one semester in high school? 

Yes 

No 

Omit 


3. 



4. How long has it been since you have taken a mathematics course or other formal 
mathematics training? 

Less than one year 

One to three years 

Four to six years 

Seven or more years 

Omit 

5. What is your sex? 

Female 

Male 

Omit 

6. How do you describe yourself? 

Native American, American Indian, or Alaskan Native 

Black or African American 

Mexican American or Chicano 

Puerto Rican 

Other Hispanic, Latino, Central American, or South American 

Asian or Pacific American 

White (non-Hispanic) or Caucasian 

Other 

Omit 

7. Is English the first language you learned? 

Yes 

No 

Omit 


8. What documented disabling condition do you have, if any, that might affect the usefulness 

of your test scores as measures of your skills? (Select only one.) Upon receiving your results 
you may wish to contact student services for advice. 

None 

Blindness or other visual impairment 

Deafness or other nearing impairment 

Paraplegia 

Learning Disability 

Other neurological or orthopedic impairment 

Other 

Omit 



9. What is the highest level of education completed by your father or male guardian? 

Grade school or less 

Some high school 

High school diploma or equivalent 

Business or trade school 

Some college 

Associate degree 

Bachelor’s degree 

Some graduate or professional school 

Completed graduate or professional school 

Omit 

10. What is the highest level of education completed by your mother or female guardian? 

Grade school or less 

Some high school 

High school diploma or equivalent 

Business or trade school 

Some college 

Associate degree 

Bachelor’s degree 

Some graduate or professional school 

Completed graduate or professional school 

Omit 



